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Summary 

Machine vision systems are often considered to be 
composed of two subsystems: low-level vision and high- 
level vision. Low-level vision consists primarily of image 
processing operations performed on the input image to 
produce another image with more favorable character- 
istics. These operations may yield images with reduced 
noise or cause certain features of the image to be empha- 
sized (such as edges). High-level vision includes object 
recognition and, at the highest level, scene interpretation. 
The bridge between these two subsystems is the segmen- 
tation system. Through segmentation, the enhanced input 
image is mapped into a description involving regions with 
common features which can be used by the higher level 
vision tasks. 

There is no theory on image segmentation. Instead, image 
segmentation techniques are basically ad hoc and differ 
mostly in the way they emphasize one or more of the 
desired properties of an ideal segmenter and in the way 
they balance and compromise one desired property 
against another. 

These techniques can be categorized in a number of 
different groups including local vs. global, parallel vs. 
sequential, contextual vs. non contextual, interactive vs. 
automatic. In this paper, we categorize the schemes into 
three main groups: pixel-based, edge-based, and region- 
based. Pixel-based segmentation schemes classify pixels 
based solely on their gray levels. Edge-based schemes 
first detect local discontinuities (edges) and then use that 
information to separate the image into regions. Finally, 
region-based schemes start with a seed pixel (or group of 
pixels) and then grow or split the seed until the original 
image is composed of only homogeneous regions. 

Because there are a number of survey papers available, 
we will not discuss all segmentation schemes. Rather than 
a survey, we take the approach of a detailed overview. 

We focus only on the more common approaches in order 
to give the reader a flavor for the variety of techniques 
available yet present enough details to facilitate imple- 
mentation and experimentation. 

Introduction 

Machine vision systems are often considered to be 
composed of two sub-systems: low-level vision and high- 
level vision. Low-level vision consists primarily of image 
processing operations performed on the input image to 
produce another image with more favorable character- 
istics. These operations may yield images with reduced 
noise or cause certain features of the image to be empha- 
sized (such as edges). High-level vision includes object 


recognition and, at the highest level, scene interpretation. 
The bridge between these two subsystems is the segmen- 
tation system. Through segmentation, the enhanced input 
image is mapped into a description involving regions with 
common features which can be used by the higher level 
vision tasks. On one hand, this procedure should be 
sensitive enough to extract those areas of interest in the 
image. On the other hand, it should be immune to the 
disturbances of irrelevant objects and noise in the system. 

Ideally, a good segmenter should produce regions which 
are uniform and homogeneous with respect to some 
characteristic such as gray tone or texture yet simple, 
without many small holes. Further, the boundaries of each 
segment should be spatially accurate yet smooth, not 
ragged. And finally, adjacent regions should have signifi- 
cantly different values with respect to the characteristics 
on which region uniformity is based. This situation can be 
represented mathematically as follows: 

If / is the set of all pixels and P() is a uniformity predicate 
defined on groups of connected pixels, a segmentation of 
/ is a partitioning set of connected subsets or image 
regions {Rj, R 2 , Rn } such that 

n 

^Jri =/, where R^Rm = 0 V l*m (1) 
1=1 

and the uniformity predicate (such as nearly equal gray 
level) satisfies 

P(Rl) = True VI (2) 

P( Rl^J Rm ) = False, V R\ adjacent to R m (3) 

(R1 3 Rm) a (Rm * 0) a (P(Rl) = True) => P(Rm) 

= True 

Because noise destroys homogeneity in a local context, it 
is not possible to determine a consistent homogeneity of 
larger regions, resulting in fragmented segmentation 
results. If noise characteristics are known, however, it is 
possible to determine homogeneity on statistical grounds. 
In this case, we must drop the consistency criterion given 
by equation (4) which states that if a region is homo- 
geneous, then all subsets of this region will also be 
homogeneous. This means that a region may be deter- 
mined to be homogeneous even when subsets of this 
region are not. 

There is no theory on image segmentation. Instead, image 
segmentation techniques are basically ad hoc and differ 
mostly in the way they emphasize one or more of the 
desired properties of an ideal segmenter and in the way 
they balance and compromise one desired property 



against another. These techniques can be categorized in a 
number of different groups including local vs. global, 
parallel vs. sequential, contextual vs. non contextual, 
interactive vs. automatic. In this paper, we categorize the 
schemes into three main groups: pixel-based, edge-based, 
and region-based. Pixel-based segmentation schemes 
classify pixels based solely on their gray levels. Edge- 
based schemes first detect local discontinuities (edges) 
and then use that information to separate the image into 
regions. Finally, region-based schemes start with a seed 
pixel (or group of pixels) and then grow or split the seed 
until the original image is composed of only homoge- 
neous regions. 

Because there are a number of survey papers available 
(Sahoo et al., 1988; Weszka, 1978; Haralick and Shapiro, 
1985), we will not discuss all segmentation schemes. 
Rather than a survey, we take the approach of a detailed 
overview. We focus only on the more common 
approaches in order to give the reader a flavor for the 
variety of techniques available yet present enough details 
to facilitate implementation and experimentation. 

Pixel-Based Segmentation Schemes 

Mode Method 

The most widely used segmentation technique is the 
mode method which is applicable to images with bimodal 
histograms, as shown in figure 1 . One mode of the 
histogram corresponds to the gray levels of the object 
pixels while the other mode captures the gray levels of the 
background pixels. It is assumed that a fixed threshold 
level exists that separates the background area from the 
objects. The threshold level is chosen to be the gray level 
in between the two modes using any of a number of 
different methods. The two most popular methods are 
Gaussian filtering (Jain and Dubuisson, 1992) and Otsu’s 
method based on discriminant analysis (Otsu, 1979). 
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Figure 1: 

A bimodal histogram. One mode 

represents the background pixels while the 
other represents the object pixels. 


Gaussian filtering algorithm- The simplest segmen- 
tation method is based on the Bayes decision theory in 
pattern recognition. The gray level histogram of the 
image is computed and then two component densities are 
extracted (corresponding to the object and the back- 
ground) from the mixture density associated with the 
histogram. It is commonly assumed that both the 
background and the object densities are Gaussian. 

Algorithm: 

1 . Compute the mean (p) and standard deviation (a) of 
the histogram: 



(5) 

o= i /w 2/®*°- ** )2 

(6) 


where F(i) is the histogram value for gray level i (out of L 
gray levels) and N is the number of points in the window. 

To avoid the problem of division by 0 (for the deviation is 
necessarily 0 for 1 -pixel regions and regions having 
identically valued pixels), a small positive constant can be 
added to a. 

2. Find a least-squares fit of 

(7) 

Oj ^ o 2 ^ 


to the histogram F(i) by adjusting the parameters Pj, pi, 
G\ f P2» P2* <*2 as follows: 

(i) Smooth the histogram by taking a local weighted 
average: 

F(i - 2) +2F(i - 1) + 3F(i) + 2F(i + 1) + F(i + 2) 

F (i) = 

( 8 ) 

On the smoothed histogram, find the deepest valley 
v (= lowest value) and use it to divide the histogram into 
two parts. Compute initial estimates of Pi, pj t c\ f P2, P2> 
and 02 on these two parts (for the original histogram 
F(i)): 
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F(i)*o-m) 2 


°2 = 


V F(i)*(i-ia 2 ) 2 


a 2 = 




=v+l 


(ii) Use a hill-climbing method to minimize: 
L 

^[f(i)-F(i)] 2 


(ID 


( 12 ) 


(13) 


(14) 


i=l 


(a) Calculate: val = lf(i) - F(i)l for i = deepest valley 
(v) chosen in step (i) or (ii.d). 

(b) Calculate: left_val = lf(i - 1) - F(i - 1 )l. 

(c) Calculate: right_val = lf(i + 1 ) - F(i + 1 )l. 

(d) If (left_val < val), set the estimate for deepest 
valley at / - 1. 

Else if (right_val < val), set the estimate for 
deepest valley at i + 1 . 

Else deepest valley found at i. 

(e) If the deepest valley value was changed in 
step (d), reestimate P], \i \ , G \ , P 2 , an ^ a 2 using 
equations (9-13) and the new value of v. Repeat 
steps (a-d). 

3. After the parameters of the mixture density have been 
estimated, a pixel with gray level x is assigned to the 
object if 

Pj_ -(x-m) 2 /2a 2 > ^ -(x-p 2 ) 2 /2o 2 

02 ^ 


The threshold value t is then defined as 

-(t-m) 2 /2 o\ _ P 2 — (t— M-2 ) 2 / 2o| 


-e 

o 1 c 
and satisfies: 
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(16) 
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| t + 4_4 + 21n^- = 0 

o? o 2 Pi°2 


(17) 


Otsu’s algorithm- Otsu’s method of determining a 
threshold in a bimodal histogram is based on discriminant 
analysis in which thresholding is regarded as the parti- 
tioning of the pixels of an image into two classes Co and 
C 1 at gray level t. 

Algorithm: 

n\ = number of pixels at level i (from L gray levels) 

N = total number of pixels = n 1 + n 2 + ... + nL 

1 . The gray level histogram is normalized and regarded 
as a probability distribution: 


Pi = ni / N 

08) 

Pi - 0 

(19) 

Pi =1 

(20) 


i=l 

2. Dichotomize pixels into two classes Co and Ci by a 
threshold at level k. 

3. Calculate the probabilities of class occurrence: 

k 

wq = Pr (Co) = Pi = w(k) (21) 

i=l 
L 

wi=Pr(Ci)= ^ Pi = l-w(k) (22) 
i=k+l 


4. Calculate the class mean levels: 


k k 

H 0 =yiPr(ilC 0 ) = y ipi/w 0 = ji(k)/w(k) 
i=l i-1 

L L 

Pl = ^^iFr(i IC,) = yipj/w, 

i=k+l i=k+l 

= (H T -)i(k))/(l-w(k)) 


(23) 


(24) 


where w(k) and |i(k) are the zeroeth and first-order 
cumulative moments of the histogram up to the kth level 
and 


L L 

m= yiPr(ilCi)= y ipj/w| 

i=k+l i=k+l 

= (^ x -|X(k))/(l-w(k)) 

is the total mean level of the original picture. 


(25) 
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5. 


Calculate class variances: 


IV R. 

°0 = X (i " Mo)2 ^ ' Co) = X ( ‘ " ^° )2 Pi/W ° (26) 


i=l 

L 


i=l 

L 


°l = ^(i-lij) 2 Pr(i I Cx) = ^(i-Hi) 2 Pi/wj 


(27) 


i=k+l 


i=k+l 


6. In order to evaluate the “goodness” of the threshold 
k , we can use the following discriminant criterion 
measures (or measures of class separability): 
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where 


aw =w 0 OQ + w|a 2 


is the within-class variance. 


= wq(|io -i^t ) 2 + w i(^i -ht ) 2 


= w 0 w 1 (|i 1 -p 0 ) z 

is the between-class variance, and 
L 

ax = ^(i-(i T ) 2 Pi 


(28) 


(29) 


(30) 


(31) 


i=l 


is the total variance. 


Note that A,, k, and T| are equivalent to one another for a 
given k because 
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a W + a B 
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(32) 


7. The problem is now reduced to an optimization 
problem to search for a threshold k that maximizes one of 
the object functions (the criterion measures). Note that 
Gy/ and Gg are functions of threshold level k, whereas 
Gy is independent of k. Further, 0\y is based on second- 
order statistics while g| is based on first-order statistics. 
Thus, we use T[ since it is the simplest measure with 
respect to k: 


g| 

n=-f- 

Gy 


(33) 
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Since Gy is independent of k, we can maximize rj by 
maximizing G^(k) : 

CT 2 (k)= (Hi (34) 

w(k)[l-w(k)] 


Thus, the optimal threshold t is chosen to be that k which 
maximizes OB(k). 

The threshold determination methods discussed above 
work well when the object size is large enough to make a 
distinct mode in the histogram, the gray level noise 
distribution (intensity noise) is independent of the gray 
level, and the noise is spatially uncorrelated. The methods 
fail when it is difficult to detect the valley bottom, as in 
images with extremely unequal peaks or in those with 
broad and flat valleys. Since peaks tend to become wider 
and lower with an increasing amount of intensity noise, 
some sharpening of the peaks and valleys can be accom- 
plished by applying noise reduction preprocessing 
procedures. 

Another approach to shaipening peaks and valleys is to 
weigh the influence of individual pixels and not count 
them all equally when calculating the histogram, as in the 
gradient-guided methods. Gradient guided histograms 
take one of two forms, interior only or boundary only. 

The interior-only methods (Mason et al., 1975; Panda and 
Rosenfeld, 1978) take into account only pixels belonging 
to either the objects or the background (i.e., those pixels 
having low gradient values), disregarding pixels belong- 
ing to boundaries where the gray level varies rapidly. This 
should yield a histogram with sharper peaks and deeper 
valleys. In contrast, the boundary-only methods (Weszka, 
Nagel, and Rosenfeld, 1974; Watanabe et al., 1974) take 
into account only pixels belonging to boundaries (i.e., 
those pixels having high gradient values). This should 
yield a well-defined unimodal histogram, the peak value 
of which is a proper constant threshold level. 

Finally, instead of computing a ID histogram of gray 
level values, a 2D histogram or “scatter” diagram can be 
computed with gray level and gradient as its coordinates. 
In this case, a good threshold can be selected using 
clusters of points rather than the modes of a histogram 
(Katz, 1965; Weszka and Rosenfeld, 1979). 

Local methods- Global segmentation techniques such as 
the mode method are notoriously sensitive to parameters 
like ambient illumination, object shape and size, noise 
level, variance of gray levels within the object and 
background, and contrast (Taxt et al., 1989). When there 
is a large range of variation in gray values from one part 
of the image to the other, a single threshold value cannot 
be used. Further, objects may legitimately have widely 
different albedos and, as a result, an object in one part of 
an image may be lighter than the background in another 
part. Local methods attempt to eliminate these disadvan- 
tages by partitioning the image into subimages, deter- 
mining a threshold for each of these subimages, and 
then smoothing between subimages to eliminate discon- 
tinuities. An example of this group of methods is the 


4 


Chow-Kaneko adaptive thresholding method (Chow and 
Kaneko, 1972). This method assigns a different threshold 
value to each pixel. 

Chow-Kaneko adaptive thresholding algorithm- 

1. Subdivide the image into several subimages. 

2. For each subimage, compute the histogram, smooth 
it, and determine a threshold using the mode method. 

3. Smooth the thresholds among the neighboring 
subimages. 

4. Determine a threshold for each pixel by interpolation. 
For example, to interpolate the 2 x 2 image: 

"4 7' 

1 10 _ 

into a 4 x 4 image, begin in the columns direction and 
form a 2 x 4 image: 

"4 5 6 7" 

7 8 9 10_ 

Then interpolating in the rows direction, form the desired 
4x4 image: 

"4 5 6 7“ 

5 6 7 8 

6 7 8 9 

7 8 9 10 

5. Threshold the image using the threshold value 
assigned to each pixel. 

The biggest determinant of whether a local thresholding 
method produces an acceptable segmentation is the size 
of the subwindows. If it is chosen to be too large, the 
algorithm will not focus on local properties and will not 
perform significantly better than global techniques. On 
the other hand, if it is chosen to be too small, the 
histograms produced for each subwindow would yield 
meaningless statistics since the number of pixels par- 
ticipating in the process would be reduced substantially. 
Unfortunately, the best method of choosing an appro- 
priate window size is by trial-and-error. 

Even if window size is chosen well, the grid imposed on 
the image may not be coherent with the image contents 
and thus the threshold values determined within a 
subwindow would be set at arbitrary locations instead of 
being placed in truly meaningful positions. Purely local 
techniques are blind to trends in the data that are 
significantly larger than their element size. Finally, 
serious errors can occur if, due to noise and bad lighting 
conditions, grid windows placed entirely on object areas 


or entirely on background areas, by chance yield 
subhistograms that are judged to be bimodal. 

Edge-Based Segmentation Schemes 

Edge-based segmentation schemes also take local 
information into account but do it relative to the contents 
of the image, not based on an arbitrary grid. Each of the 
methods in this category involves finding the edges in an 
image and then using that information to separate the 
regions. The most direct method is the detect and link 
method in which local discontinuities are first detected 
and then connected to form longer, hopefully complete, 
boundaries. The disadvantage of this approach is that the 
edges are not guaranteed to form closed boundaries and 
thus the subsequent thresholding scheme merges regions 
which may not be uniform (relative to the uniformity 
predicate discussed in the introduction). 

An improvement over this method is Yanowitz and 
Bruckstein’s adaptive thresholding method (Yanowitz 
and Bruckstein, 1989). Similar to the detect and link 
method, the adaptive thresholding method locates objects 
in an image by using the intensity gradient. These edges 
are used as a guide to determine an initial threshold level 
for various areas of the image. Local image properties are 
then used to assign thresholds to the remainder of the 
image. 

Another improvement of the detect and link method is the 
local intensity gradient (LIG) method (Parker, 1991). It is 
similar to Yanowitz and Bruckstein’ s algorithm and 
works well in practice. We present each algorithm below. 

Yanowitz and Bruckstein’s Adaptive Thresholding 
Algorithm 

1. Smooth the image, replacing every pixel by the 
average gray-level values of some small neighborhood 
of it. 

2. Derive the gray-level gradient magnitude image from 
the smoothed original. In discrete images, the gradient is 
actually computed as an intensity difference over a small 
distance: 

G(i,j) = min(I(i,j) - I(i + 5j , j + 6j )) 


where I is the image being examined and G is the 
resulting image consisting of differences. 

3. Thin the gradient image, keeping only points in the 
image which have local gradient maxima. This should 
produce a one-pixel wide edge. 
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4. Sample the smoothed image at these maximal 
gradient (or edge) points. These points are assumed to be 
correct. 

5. Interpolate the sampled gray levels over the image. 
The result is a threshold surface, with a (possibly) 
different threshold value for each pixel. 

6. Using the obtained threshold surface, segment the 
image. If the original pixel value is greater than or equal 
to the threshold value at that location, set the thresholded 
value to 1 (or white). Otherwise, set the value to 0 (or 
black). Thus, objects will be set to white and background 
to black. 

Local Intensity Gradient Method Algorithm 

The local intensity gradient method (Parker, 1991) is 
based on the fact that objects in an image will produce 
small regions with a relatively large intensity gradient (at 
the boundaries of objects) whereas other areas ought to 
have a relatively small gradient. It uses small subimages 
of the gradient image to find local means and deviations. 
As in the local mode techniques, these regions must be 
small enough so that the illumination effects can be 
ignored. 

1 . Compute a smooth gradient of the image. 

• For ail pixels in the N x N image (IM1), compute 
the minimum difference between the pixel and all eight 
neighbors. (See gradient computation in step 2 of 
Yanowitz and Bruckstein’s algorithm.) Store in IM2. 

• Break up IM2 (gradient array) into subregions, 
each composed of M x M pixels. Compute the mean 
value (QIM) and mean deviation (QDEV) for each 
subregion k: 

M M 

QM[ki =^£5>«> <38) 

i=i j=i 
I M M 

QDEV[k, -jM^M ^^(IM2[i][j]-QIM[k]) 2 (39) 
1 i=l j=l 

• Smooth both QIM and QDEV. The value of QIM 
(and QDEV) at each point is replaced by the weighted 
mean of all the neighboring subregions using the 
following weight matrix: 

'0.7 1.0 0.7" 

1.0 1.5 1.0 
0.7 1.0 0.7_ 


• Interpolate/extrapolate the values of QIM and 
QDEV to fill an N x N region again. Linear interpolation 
is acceptable. This results in an image containing 
estimates of the gradient and deviation at each pixel. 

2. Find the object pixels in the gradient image. Object 
pixels are defined as outliers in the smoothed image. That 
is, pixel [ij] is an object pixel if IM2[i j] < QIM[i,j] + 
3*QDEV[ij]. Otherwise [i j] is unclassified. 

3. After sample object pixels are found, thresholds can 
be identified for the remaining pixels. This can be done 
using a region growing procedure based on gray levels in 
the local surroundings, and begins at pixels that are 
known to belong to the object. 

• For all unclassified pixels [i,j], select a gray level 
threshold by finding the smallest gray level value in an 
8-neighborhood (pixel aggregation). If IMl[ij] is less 
than this value, it is an object pixel. Repeat until no more 
object pixels are found. 

• (Optional) For all still unclassified pixels, compute 
a threshold as the mean of at least four neighboring object 
pixels (region growing). Repeat until no new object pixels 
are found. 

• (Optional) For all still unclassified pixels, compute 
a threshold as the minimum of at least six neighbors 
(region growing). Repeat until no new object pixels are 
found. 

4. Set all object pixels in IM1 to the value 0 and all 
unclassified pixels to a positive value. IM1 now contains 
the thresholded image. 

Region-Based Segmentation Schemes 

Region-based segmentation schemes attempt to group 
pixels with similar characteristics (such as approximate 
gray level equality) into regions. Conventionally, these 
are global hypothesis testing techniques. The process 
can start at the pixel level or at an intermediate level. 
Generation and filtering of good seed regions of high 
confidence is essential. Given initially poor or incorrect 
seed regions, these techniques usually do not provide any 
mechanism for detecting and rejecting local gross errors 
in situations such as when an initial seed region spans two 
separate surfaces. These techniques can also fail if the 
definition of region uniformity is too strict, such as when 
we insist on approximately constant brightness while in 
reality brightness may vary linearly. Another potential 
problem with region growing schemes is their inherent 
dependence on the order in which pixels and regions are 
examined. Usually, however, differences caused by scan 
order are minor. 
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There are two approaches in region-based methods: 
region growing and region splitting. In the region 
growing methods, the evaluated sets are very small at the 
start of the segmentation process. The iterative process of 
region growing must then be applied in order to recover 
the surfaces of interest. In the region growing process, the 
seed region is expanded to include all homogeneous 
neighbors and the process is repeated. The process ends 
when there are no more pixels to be classified. Because 
initial seeds are very small, the processing time can be 
minimized by minimizing the number of times an image 
element is used to determine the homogeneity of a region. 

In region splitting methods, on the other hand, the 
evaluation of homogeneity is made on the basis of large 
sets of image elements. The process starts with the entire 
image as the seed. If the seed is inhomogeneous, it is split 
into a predetermined number of subregions, typically 
four. The region splitting process is then repeated using 
each subregion as a seed. The process ends when all 
subregions are homogeneous. Because the seeds being 
processed at each step contain many pixels, region 
splitting methods are less sensitive to noise than the 
region growing methods. In both approaches, their 
iterative structure leads to computationally intensive 
algorithms. 

In the late 70s, Horowitz and Pavlidis developed a hybrid 
algorithm, the split, merge, and group (SMG) algorithm 
(Horowitz and Pavlidis, 1976; Chen and Lin, 1991), 
which incorporates the advantages of both approaches. 
Because the SMG algorithm begins at an intermediate 
resolution level, it is more efficient than either the pure 
split algorithms or the pure merge algorithms. The major 
disadvantage, however, is that the resulting image tends 
to mimic the data structure used to represent the image 
(a quadtree for 2D images or an octree or K-tree for 
3D images) and comes out too square. 

Split, Merge, and Group Algorithm 

1 . Initialization phase: 

Divide the image into subimages using a quadtree 
structure, as shown in figure 2. The root of the quadtree 
corresponds to the whole image. Each node in the 
quadtree has only one parent (except for the root) and 
four children (except for the leaves). The four children are 
denoted by the quadrant within the parent that they 
correspond to (NW, NE, SW, SE). Thus, the image must 
be 2 n x 2 n pixels. The leaves are at node level 0. The root 
is at level n. During initialization, the quadtree is built 
from the root down to a heuristically set initialization 
level L s . The choice of the initialization level L s can be 



selected in terms of minimizing the expected number of 
splits and merges. 


2. Merge phase: 

If level L s is homogeneous, evaluate the homoge- 
neity of level L s + 1. If the node is homogeneous, the four 
children are cut and the node becomes an endnode. 

Repeat until no merges take place or level n is reached 
(a homogeneous image). 

3. Split phase: 

If level L s is inhomogeneous, split the nodes into 
four children and add them to the quadtree. Evaluate the 
new endnodes and if necessary, split again until the 
quadtree has homogeneous endnodes only. 

4. Conversion from quadtree to RAG phase: 

• A RAG is a Region Adjacency Graph. It allows 
different subimages that are adjacent but cannot be 
merged in the quadtree to be merged. 

• This phase consists of extracting the implicit 
adjacency relations from quadtree endnodes needed to 
construct the branches of the RAG. Two neighboring 
subimages in the quadtree will have common ancestors, 
i.e., nodes in the quadtree on a higher level from which 
both endnodes can be reached. 

• First, the nearest common ancestor is determined 
that connects the current endnode with the neighbor. 

Next, the path is mirrored about an axis formed by the 
common boundary between the adjacent subimages. 

5. Grouping phase: 

The now explicit neighbor relationships can be used 
to merge adjacent nodes which have a homogeneous 
union. Grouping strategies include: 

a. Assign the first node of the RAG (corresponding 
with the subimage in the top left comer) the status of 
seed. The neighbors of the seed are then evaluated on 
homogeneity together with the seed. A merge of a 
neighbor with the seed produces new neighbors, which 
are evaluated. When no more grouping takes place, the 
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seed is rendered inactive and a new seed (the first 
unprocessed node) is assigned. The grouping phase ends 
if all remaining RAG nodes have become inactive. 

b. Sequential grouping: the seeds are chosen based 
on their size with the first seed being the largest sub- 
image, etc. A disadvantage of this approach is that 
because of the size of the first seed, these regions tend to 
grow beyond their “actual” boundaries, annexing all 
fuzzy border areas. 

c. Parallel grouping: assign a number of active 
seeds at the start of the grouping phase. Now only direct 
neighbors of a seed are grouped if possible. New 
neighbors have to wait for evaluation until the seed is 
processed again. Active seeds are processed successively 
until none remain. The growing of seeds will be bounded 
by other seeds. 

Grouping strategy (a) is sufficient in practice. 

6. Postprocessing of the RAG phase: 

* If subimages are too small, merge them with their 
nearest neighbor. It is difficult to interpret very small 
regions as objects and since there is usually a relatively 
large number of them, their presence increases the 
computational burden on later stages of processing. 

• Exploit prior knowledge about the application 
problem to improve the segmentation. 

Concluding Remarks 

The goal of image segmentation is a domain-independent 
decomposition of an image into distinct regions which are 
uniform in some measurable property such as brightness, 
color, or texture. Unfortunately, natural scenes often 
contain feature gradients, highlights, shadows, textures, 
and small objects with fine geometric structure, all of 
which make the process of producing useful segmenta- 
tions extremely difficult. We have presented some of the 
techniques which attempt to deal with these difficulties. 
Although they produce reasonable segmentations in many 
situations, at some point local ambiguities and errors 
introduced by the segmentation process can only be 
resolved by application specific knowledge. 

Since the quality of the above segmentation techniques 
depends on the type of image the technique is being 
applied to, we will end this overview with a summary of 
what type of image each technique works best on. 

• The mode method is applicable to images with bimodal 
histograms where the modes are fairly distinct (well 
separated and sharp) and of nearly equal length. It does 


not work well if the gray level noise distribution is 
dependent on the gray level or is spatially correlated. 

• Local methods, such as the Chow-Kaneko method, are 
applicable to images in which the ambient illumination 
may vary in gray level from one part of the image to 
another or when one part of an image may be lighter than 
the background in another (as long as the contrast in each 
area is adequate). The major disadvantage of local 
methods is that it is difficult to choose an appropriate 
window size which localizes the illumination variation 
yet considers a large enough area to yield meaningful 
statistics. Also, even if the window size is chosen well, 
the grid imposed on the image may not be coherent with 
the image contents and thus threshold values determined 
within a subwindow would be set at arbitrary positions 
instead of being placed in truly meaningful positions. 

• Because the quality of the segmentation depends on the 
quality of the edge detector, edge-based schemes work 
best on images in which the edges are easily detectable — 
that is, images which have good local (5x5 pixel area or 
less) contrast. They do not work well with images in 
which the noise forms well-defined edges. 

• Region-based schemes work well for images with an 
obvious homogeneity criteria (such as nearly equal gray 
level). Also, these schemes tend to be less sensitive to 
noise since homogeneity is typically determined 
statistically. Their disadvantages are that an initial split- 
level must be chosen well (else the technique could be 
very slow) and the segmented image tends to mimic the 
data structure used to represent the image and is thus too 
square. 
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